Frame decimation for structure from motion

ABSTRACT

A preprocessing mechanism automatically processes a video sequence in order to obtain a set of views suitable for structure from motion (SaM) processing. The preprocessor employs a frame decimation algorithm which removes redundant frames within the video sequence based on motion estimation between frames and a sharpness measure. The motion estimation can be performed either globally or locally. The preprocessor can be configured to process static video sequences (i.e., previously acquired/captured) to select a minimal subsequence of sharp views form the video sequence, or to process the frames of a video sequence as they are being captured to discard frames which are redundant. In either configuration, the goal of the preprocessor is to select a minimal sequence of sharp views tailored for SaM processing.

FIELD OF THE INVENTION

[0001] The present invention relates to the field of three-dimensional(3D) modeling. More particularly the present invention relates to theestimation of 3D structure from the images of a video sequence.

BACKGROUND

[0002] In the computer vision community, research is currently veryactive regarding structure from motion, particularly the estimation of3D structure from the images of a video sequence. The interest of theStructure from Motion (SaM), also referred to as Structure and Motion,branch of computer vision has recently shifted to developing reliableand practical SaM algorithms and to building systems which incorporatesuch algorithms. Further, special interest has been devoted todeveloping systems which are capable of processing video images directlyfrom an initially uncalibrated camera to automatically produce athree-dimensional graphical model. Great advances have been made towardsthese goals, as reflected in the number of SaM algorithms which havebeen developed. For example, the algorithm disclosed by Fitzgibbon etal. in “Automatic Camera Recovery for Closed or Open Loop ImageSequences”, Proc. ECCV 1998, and the algorithm discussed by Pollefys etal. in “Self-calibration and Metric Reconstruction in Spite of Varyingand Unknown Internal Camera Parameters”, IJCV, August 1999, both ofwhich are incorporated herein by reference.

[0003] Typical structure from motion algorithms match points betweenimages that are projections of the same point in space. This in turnenables triangulation of depth, in the same way as the human brainperforms stereo vision. The result of SaM processing is athree-dimensional textured model of the structure seen in the images.The position and calibration of the projections that produced the imagescan also be retrieved. However, many of the SaM algorithms perform bestwhen supplied with a set of sharp, moderately interspaced still images,rather than with a raw video sequence. Therefore choosing a subset offrames from a raw video sequence can produce a more appropriate input tothese algorithms and thereby improve the final result.

[0004] One way to obtain a smaller set of views (i.e., a subset offrames), is simply to use a lower frame rate than the one produced bythe camera. However, this is inadequate for several reasons. First, itcan lead to unsharp frames being selected over sharp ones. Second, ittypically means that an appropriate frame rate for a particular shot hasto be guessed by the user or even worse, predefined by the system. Ingeneral, the motion between frames has to be fairly small to allowautomatic matching, while significant parallax and large baseline isdesirable to get a well-conditioned set of views (i.e., sequence offrames). If the frame rate is too low, matching between the image can bedifficult if not impossible. However, with high frame rates (e.g., thefull frame rate of a video camera) memory is quickly and sometimesneedlessly consumed. Further, higher frame rates, while necessary insome situations, often result in a large abundance of similar views,which can produce problems, not only in terms of memory requirements,but in terms of processing requirements and/or numerical stability aswell. Accordingly, with high frame rates, an unnecessarily large set ofviews is produced and with low frame rates, the connectivity betweenframes is jeopardized. In fact, the appropriate frame rate depends onthe motion and parallax of the camera and can therefore vary over asequence.

[0005] In practice, it is impossible to know in advance the appropriateframe rate at which a handheld video sequence should be grabbed,inasmuch as the appropriate frame rate depends on the motion of thecamera and the imaged structure. Furthermore, the appropriate frame ratecan vary within a sequence as well as from one sequence to another.Therefore, additional mechanisms are necessary in order to incorporateraw video sequences into a full working SaM algorithms system.

SUMMARY OF THE INVENTION

[0006] As a solution to the above described problems, a mechanism isprovided that produces a sparse by sufficient set of views suitable forSaM processing.

[0007] In exemplary embodiments, the preprocessing mechanism (hereinreferred to as the preprocessor) is configured to process previouslyacquired (i.e., captured) video sequences to select a minimalsubsequence of sharp views from the video sequence.

[0008] In other exemplary embodiments, the preprocessor is configured toprocess a video sequence as it is being captured to discard frames whichare redundant. Thereby providing a minimal sequence of sharp viewstailored for SaM processing.

[0009] Acquiring structure from motion using video sequences, especiallyfor handheld video sequences, is more practical using the preprocessorof the present invention. For example, the relatively expensive SaMprocessing noted above can be performed on a smaller subsequence ofviews provided by the preprocessor. Further, the tedious manual tasks,such as segmentation of the raw video material into shots and the choiceof frame rate for each shot, required to be preformed on the sequencesprior to SaM processing are automated using the preprocessor.

BRIEF DESCRIPTION OF THE DRAWINGS

[0010] A more complete understanding of the present invention may bederived by referring to the detailed description and claims whenconsidered in connection with the Figures, wherein like referencenumbers refer to similar items throughout the Figures, and:

[0011]FIG. 1 shows a block diagram of a preprocessing device accordingto an embodiment of the present invention;

[0012]FIG. 2 shows a flow chart describing a frame decimation algorithmaccording to an embodiment of the present invention;

[0013]FIG. 3 shows a flow chart describing a frame decimation algorithmaccording to an alternative embodiment of the present invention;

[0014]FIG. 4 shows a flow chart describing a second frame decimationalgorithm according to an alternative embodiment of the presentinvention;

[0015]FIG. 5 shows a flow chart describing a third frame decimationalgorithm according to an alternative embodiment of the presentinvention;

[0016]FIG. 6 shows a flow chart describing a fourth frame decimationalgorithm according to an alternative embodiment of the presentinvention;

[0017]FIG. 7 shows the final reconstruction results for the sequenceHouse;

[0018]FIG. 8 shows the final reconstruction results for the sequenceBasement;

[0019]FIG. 9 shows shot boundary detection results on the sequenceSceneswap;

[0020]FIG. 10 shows shot boundary detection results on the 25 Hzsequence TriScene;

[0021]FIG. 11 shows shout boundary detection results on the sequenceStove at multiple frame rates;

[0022]FIG. 12 shows the final reconstruction results for the sequenceStove;

[0023]FIG. 13 shows the final reconstruction results for the sequenceRoom;

[0024]FIG. 14 shows the final reconstruction results for the sequenceDavid from the side view;

[0025]FIG. 15 shows the final reconstruction results for the sequenceDavid from the top view;

[0026]FIG. 16 shows selective frames from the test video sequences;

[0027]FIG. 17 shows the data for various test sequences.

DETAILED DESCRIPTION OF THE INVENTION

[0028] In the following description, for purposes of explanation and notlimitation, specific details are set forth, such as particular circuits,circuit components, techniques, steps etc. in order to provide athorough understanding of the present invention. However, it will beapparent to one of ordinary skill in the art that the present inventionmay be practiced in other embodiments that depart from these specificdetails. In other instances, detailed descriptions of well-knownmethods, devices, and circuits are omitted so as not to obscure thedescription of the present invention with unnecessary detail.

[0029] These and other aspects of the invention will now be described ingreater detail in connection with a number of exemplary embodiments. Tofacilitate an understanding of the invention, many aspects of theinvention are described in terms of sequences of actions to be performedby elements of a computer system. It will be recognized that in each ofthe embodiments, the various actions could be performed by specializedcircuits, by program instructions being executed by one or moreprocessors, or by a combination of both. Moreover, the invention canadditionally be considered to be embodied entirely within any form ofcomputer readable storage medium having stored therein an appropriateset of computer instructions that would cause a processor to carry outthe techniques described herein. Thus, the various aspects of theinvention may be embodied in many different forms, and all such formsare contemplated to be within the scope of the invention.

[0030] In general the present invention relates to frame decimation.More particularly, the present invention relates to a preprocessorconfigured to perform frame decimation in order to select a sequence offrames with appropriate amounts of motion between neighboring frameswithin the sequence for SaM processing.

[0031] An ideal preprocessor is idempotent. An operator T is calledidempotent if T²=T. In other words, applying the preprocessor twiceshould yield the same result as applying it once. This is a qualitypossessed by, for example, ideal histogram equalization or idealbandpass filtering. Another desirable property of the preprocessor isthat the frame decimation algorithm should produce similar output at allsufficiently high input frame rates (i.e., at all frame rates higherthan the minimum required to allow automatic matching between frames).Furthermore, the preprocessor should not significantly affect data thatdoes not need preprocessing.

[0032]FIG. 1 shows a block diagram of a preprocessor 100 according to anembodiment of the present invention. Data buffer 104 receives frames, orsequences of frames, from video source 102. Video source 102 may bestatic or dynamic. For example, video source 102 can be a data storewherein previously capture video sequences are stored and subsequentlyfeed into data buffer 104. Alternatively, video source 102 can be ahandheld camera wherein video is feed directly into data buffer 104.Central Processing Unit (CPU) 106 executes the frame decimationalgorithm 112, stored in random access memory (RAM) 108, prior toproviding the frames received in data buffer 104 to SaM system 116 ordata storage 114. SaM system 116 can be any type of structure frommotion algorithm, device, or system, now known or later developed. Thepreprocessor 100 can be integrated directly into a video camera, a videograbbing chip, or a user terminal, such as a laptop computer, mobilephone, video phone or home computer.

[0033] According to one embodiment of the present invention, thepreprocessor is used to process static, or previously captured, videosequences in order to acquire a subsequence of frames which are suitedfor SaM processing. A first step in preparing a previously capturedvideo sequence for SaM preprocessing is to detect the shot boundaries,if any. Shot boundaries occur when the camera has been stopped and thenstarted again at a new position. The video sequence could be partitionedinto shots by the camera, but in practice this is not always the case,specifically when the camera is a general purpose handheld video camera.Further, with large amounts of video (e.g., a raw video sequence from ahandheld camera), it is rather tedious to start and stop frame grabbingto partition the material into shots. Therefore, according to oneembodiment of the present invention illustrated in FIG. 2, the shotboundaries are derived automatically using the preprocessor.

[0034] As shown in FIG. 2, a video sequence is received into the bufferfrom the video source at step 202. At step 204, the preprocessorsequentially processes the frames of the received sequence to determinethe shot boundaries within the sequence, if any. Shot boundaries aredetected, at step 204, by evaluating the correlation between adjacentframes after global motion compensation (described below). If thecorrelation is below a threshold (e.g., t₁=0.75), the second frame isdeclared the beginning of a new shot. The correlation can be measured bythe same mean square measure that is used for the motion estimation,however, the normalized correlation coefficient is preferred, as ityields a more intuitively interpretable value.

[0035] If shot boundaries are detected, the received sequence is dividedinto subsequences based on the shot boundaries (i.e., each subsequencecorresponds to a shot sequence) at step 206. Control then proceeds tostep 208, wherein processing continues separately for each subsequence(or for the received sequence if no shot boundaries were detected). Atstep 210, the subsequence is processed to identify redundant frames. Theredundant frames, if any, are then discarded at step 212. If additionalsubsequences exist, then control returns to step 208.

[0036] With regard to the detection of shot boundaries at step 204, itshould be mentioned that since a video sequence is discretely sampled intime, a shot boundary is not strictly defined. Therefore, automaticdetection of shot boundaries can be done more reliably at high framerates, since the difference between a discrete swap of the camera orview and a large motion diminishes towards lower frame rates.

[0037] The processing of steps 208-214 is based on a rough globalestimation of the rotational camera motion between frames and asharpness measure. The global motion estimation can be calculated usingthe initial step of a coarse to fine, optical flow based, videomosaicing algorithm, similar to the alogrithm disclosed by Sawhney etal. in “Robust Video Mosaicing through Topology Inference and Local toGlobal Alignment” ECCV 1998, or by Szeliski et al. in “Creating FullView Panoramic Image Mosaics and Environment Maps” Proc. SIGGRAPH 1997,both of which are incorporated herein by reference. The motivation forusing a flow based approach over a feature based in this case is thatthe flow based approach is also good for significantly unsharp framesand it is easy to obtain fast approximations by downsampling the frames(i.e., decreasing the number of pixels). The motion model is anarbitrary rotation of the camera around the center of projection and anarbitrary change of linear calibration. Assuming also a rigid world,this is equivalent to a homographic mapping H, represented by a 3×3matrix, between the homogenous image coordinates x₁ and x₂ of the firstand second frame as

x ₂ ≅Hx ₁,

[0038] where ≅ denotes equality up to scale. Both the images aredownsampled to a small size, for example, 50×50 pixels. To avoidproblems due to altered lighting and overall brightness, both images arealso normalized to have zero mean and unit standard deviation. Themapping H has eight degrees of freedom and should be minimallyparameterized. As only small rotation is expected, this can be done bysetting H₃₃=1. The minimization criterion applied to the estimation of His the mean square residual. Alternative measures exists, which providemore accurate results, however, the above is appropriate for obtaining arough estimation quickly. The mean square residual R between the imagefunctions ƒ₁ and ƒ₂, using H for the correspondence is${R\left( {f_{1},f_{2},H,\Theta} \right)} = {\frac{1}{\# (\Theta)}{\sum\limits_{x \in \Theta}{\left( {{f_{2}({Hx})} - {f_{1}(x)}} \right)^{2}.}}}$

[0039] Here, Θ is all or a subset of the set Θ_(A) of pixels in thefirst image that are mapped into the second image and #(Θ) is the numberof element of Θ. Larger sets than Θ_(A) are also possible if the imagefunctions are defined beyond the true image domain by some extensionscheme. In this case, Θ was chosen to be the whole image, except for aborder of width d, which is a maximal expected disparity. The unitmatrix is used as the initial estimate of H. Then, R is minimized by anon linear least squares algorithm such as Levenberg-Margquardt (SeePress et al., “Numerical Recipes in C” Cambridge University Press 1998).

[0040] The measure of image sharpness can be a mean square of thehorizontal and vertical derivatives, evaluated as finite differences.More exactly $\begin{matrix}{{S\left( {f,I} \right)} = \quad {{\frac{1}{2\# (I)}{\sum\limits_{{({x,y})}{cl}}\left( {{f\left( {{x + 1},y} \right)} - {f\left( {{x - 1},y} \right)}} \right)^{2}}} +}} \\{\quad \left( {{f\left( {x,{y + 1}} \right)} - {f\left( {x,{y - 1}} \right)}} \right)^{2}}\end{matrix}$

[0041] where I is the whole image domain except for the image boundariesand x, y are image coordinates. This measure is not used in any absolutesense, but only to measure the relative sharpness of similar images.

[0042] While the shot boundary detection of steps 204 and 206 isperformed in a purely sequential manner, the algorithm for selecting keyframes in steps 208-214 operates in a batch mode. Each frame in thesubsequence is traversed in order of increasing sharpness and theredundant frames are deleted at step 212.

[0043] A redundant frame is defined as a frame which is not essentialfor the connectivity of the sequence. For example, consider the frame Fibelonging to the subsequence {F_(n)}^(N) _(n=1) of frames. If i=1 andi=N (i.e., the first and last frames of the sequence), the frame is notredundant. Otherwise, a global motion estimation (as discussed above) isperformed between frame F_(i−1) and frame F_(i+1). If the motionestimation yields a final correlation coefficient above a threshold(e.g., t₂=0.95) and the estimated mapping H does not violate the maximumexpected disparity d at any point, the frame F_(i) is redundant. Thevalue of d is ten percent of the image size, which is half of themaximum disparity expected by the SaM algorithm.

[0044] This process is repeated for each frame in the sequence, andframes are deleted until further deletions result in too high adiscrepancy between neighboring frames. The discrepancy that prevents adeletion can be either a violation of the disparity constraint orsignificant parallax that causes the global motion estimation, with theassumption of camera rotation, to break down. In the latter case, thematerial has become suitable for a SaM algorithm. In the former case,the material is ready for SaM or possibly mosaicing.

[0045] In another embodiment of the present invention, the preprocessoris implemented to run in real-time. The preprocessor can adapt to themotion of the camera and avoid any undue assumptions about the inputframe rate. Furthermore, unsharp frames caused by bad focus, motionblur, etc. or a series of frames with low interdisparity can bediscarded at an early stage.

[0046] The preprocessor together with the camera then becomes anappropriate rate camera. When the camera moves slower, fewer frames areproduced, while still maintaining the connectivity of the sequence.Longer sequences can therefore be acquired before running out ofRAM-memory when using digital cameras or the like. It also improves thechances of storing the sequence to disk in real-time, which can not bedone with full rate video and present user terminal disk speeds. Infact, this could be done in conjunction with a signal to the acquiringuser to slow down the camera motion when memory is running out.

[0047] In order to configure the preprocessor to use frame decimation toavoid saving or storing the complete video sequence during the captureof the video sequence (i.e., real-time frame decimation), an algorithmhas to be chosen that uses only a limited number of recent frames toperform frame decimation (as opposed to the batch algorithm describedabove with respect to the static video which uses all of the frames inthe sequence). The idea is to perform the frame decimation in real-timeso that only a fraction of the new incoming frames have to be stored inmemory or saved to disk.

[0048] As with the static frame decimation algorithms discussed above,the real-time frame decimation algorithms described below use an imagesharpness measure and a motion estimation algorithm . The differencebetween two images (i.e., frames) after motion compensation could bemeasured either globally or locally. If it is measured locally, it isreasonable to consider two frames to be different when any local partdiffers significantly. The sharpness measure could also be disregardedand the frames traversed in any order.

[0049]FIG. 3 illustrates a real-time frame decimation algorithmaccording to a second embodiment. As shown in FIG. 3, which begins atstep 300, a new frame is received from the video source at step 301. Atstep 303, if the new frame is the first frame of a sequence, thencontrol proceeds to step 305 where the new frame is accepted (i.e.,retained in the buffer), thereafter control returns to step 301 where anew frame is received. If the new frame is not the first frame of thesequence, then control proceeds to step 307, where a motion estimatefrom the accepted frame (i.e., the other frame in the buffer) to the newframe is calculated. At step 309, the motion estimate calculated at step309 is compared to a predefined value. If the motion estimate is lessthan the predefined value (i.e., there is little difference between thetwo frames), then the new frame is discarded at step 311 and controlreturns to step 301. If the motion estimate is greater than thepredefined value, then the previously accepted frame is stored and thenew frame is accepted (i.e., retained in the buffer). If the new frameis not the last frame of the sequence, control returns to step 301.

[0050] Fame decimation algorithms, such as the one illustrated in FIG.3, are advantageous in systems where the memory resources are small dueto the fact that the buffer only stores two frames at a time, i.e., thelast accepted frame and the new frame. However, in systems where morerobust memory and processing power are available, alternative algorithmscan be employed. For example, those algorithms illustrated in FIG. 4 andFIG. 5.

[0051]FIG. 4 shows a flow chart describing a second algorithm for use ina real-time frame decimation. As shown in FIG. 4, which begins at step400, a new frame is received from the video source at step 402. If thenew frame is the first frame of a sequence, then control proceeds tostep 406 where the new frame is accepted (i.e., retained in the buffer),thereafter control returns to step 402. If the new frame is not thefirst frame of the sequence, then control proceeds to step 408, where aglobal motion estimate from all the previously accepted frames (i.e.,all the frames in the buffer) to the new frame is calculated. At step410, the global motion estimate, calculated at step 408, is compared toa predefined value. If the global motion estimate is greater than thepredefined value (i.e., there is a significant difference between thenew frame and the previously accepted frames), then the new frame isaccepted at step 412 and control returns to step 402. If the globalmotion estimate is less than the predefined value, then the new frame isdiscarded at step 414. If the new frame is not the last frame of thesequence, control returns to step 402.

[0052]FIG. 5 shows a flow chart describing a third algorithm for use ina real-time frame decimation. As shown in FIG. 5, which begins at step500, a new frame is received from the video source at step 501. If thenew frame is the first frame of a sequence, then control proceeds tostep 505 where the new frame is accepted (i.e., retained in the buffer),thereafter control returns to step 501. If the new frame is not thefirst frame of the sequence, then control proceeds to step 507, where amotion estimate from each of the N previously accepted frames (i.e.,every frame retained in the buffer) to the new frame is calculated. Atstep 509, the motion estimates, calculated at step 509, are compared toa predefined value. If any one of the motion estimates is greater thanthe predefined value (i.e., there a significant difference between thenew frame and any one of the previously accepted frames), then the newframe is accepted (i.e., added to the N frames retained in the buffer)at step 511 and control returns to step 501. Otherwise, the new frame isdiscarded at step 513. If the new frame is not the last frame of thesequence, control returns to step 501.

[0053] To simplify the motion estimation process, the motion estimationcan be performed as a form of tracking from the most recent acceptedframe across any rejected frames to the newest frame. Then the motionestimate is updated for every new frame by effectively performing amotion estimation between the most recent frame and the new frame (thatis, the last two frames), as illustrated in FIG. 6.

[0054] As shown in FIG. 6, which begins with step 600, a new frame isreceived from the video source at step 602. If the new frame is thefirst frame of a sequence, then control proceeds to step 606 where thenew frame is accepted (i.e., retained in the buffer), thereafter controlreturns to step 602. If the new frame is not the first frame of thesequence, then control proceeds to step 608, where it is determinedwhether or not a previous motion estimate has been calculated. If aprevious motion estimation does not exist, then control proceeds to step610, where a motion estimate between the new frame and the previouslyaccepted frames is calculated. At step 612, if the motion estimate isgreater than a predefined value, then, at step 614, the motion estimateand the new frame are accepted (i.e., stored in the buffer). Otherwise,the new frame and the motion estimate calculated at step 610 are discardat step 616 and control returns to 602. If, at step 608, a previousmotion calculation exists, then the previous motion calculation isrefined to include the new frame at step 618. At step 620, the refinedmotion estimate is compared to a predefined value. If the refined motionestimate is greater than the predefined value, then the refined motionestimate and the new frame are accepted at step 622 and control returnsto step 602. Otherwise the refined motion estimate and the new frame arediscarded at step 624. If the new frame is not the last frame of thesequence, then control returns to 602.

[0055] While the various frame decimation algorithms described aboveaddress various configuration of memory and processing power, they failto address the speed of the storage device. Therefore, a mechanism isincluded in the system in order to alert the user (cameraman) when thestorage device can not keep up. The cameraman should then stop thecamera motion for a period of time. The frame decimation scheme can as aresult skip the majority (if not all) of the incoming frames and thestorage device can catch up.

[0056] With respect to results obtained when the techniques describedabove are applied, attention is first turned to a theoretical result.The preprocessing algorithm is approximately idempotent and can be madeperfectly idempotent by a modification. Instead of only executing onerun over all frames to perform deletions, the process is repeated untilno additional deletions occur. The algorithm is now perfectlyidempotent. To see why, consider application of the preprocessor asecond time. No shot boundaries will be detected, because all adjacentframes with a correlation less then t₁ after motion compensation weredetected in the first pass and no new such pairs have been created byframe deletion, since t₂>t₁. Neither do any frame deletions occur duringthe second pass, since this was the stopping criterion for the firstpass.

[0057] In practice extra deletion runs are not performed since theyresult in few deletions. When a frame F_(i) is considered for deletionduring additional runs, correlation after compensation is calculatedbetween two frames F_(i−1) and F_(i+1). These two frames are always atleast as far apart in the sequence as the pair used during any of thepreceding runs. New deletions therefore seldom occur.

[0058] Let us now turn to the practical experiments. The results of apreprocessor are not strictly measurable unless the type of subsequentprocessing is defined. The experiments were performed in conjunctionwith a feature based SaM algorithm. The algorithm takes a video sequenceand automatically extracts a sparse representation in terms of pointsand lines of the observed structure. It also estimates camera position,rotation and calibration for all frames. The preprocessor was tested onapproximately 50 sequences, most of them handheld with jerky motion andimperfect focus. The sequences listed in the table shown in FIG. 17 arediscussed below. Some frames from the sequences are shown in FIG. 16 andoutlined in the Table of FIG. 17.

[0059] As was mentioned in the introduction, the preprocessor should notsignificantly change data that does not need preprocessing. This wastested in practice by applying the preprocessor and subsequent SaMsystem to sequences with sharp, nicely separated frames and no shotboundaries. Final reconstruction results for the sequence House areshown in FIG. 7. The cameras are represented by an image plane with theactual image projected onto it and a line drawn to the center ofprojection. For this sequence, the preprocessor does not falsely detectany shot boundaries, nor does it remove any frames. In other words, itjust propagates the input data to its output, which is exactly thedesired behavior. The last camera to the right is completely off thetrue turntable track. This error unfortunately occurs when going fromprojective to Euclidean reconstruction, but has nothing to do with thepreprocessor.

[0060] The final reconstruction result for the sequence Basement isdepicted from above in FIG. 8. An autonomous vehicle acquired thissequence while moving through a corridor, turning slightly left. Again,the preprocessor did not falsely detect any shot boundaries. However, itdeleted frames 3 and 7, which can in fact be seen as larger gaps in thecamera trajectory. This happens because the forward motion does notcause enough parallax. It does not negatively affect the final result.

[0061] Experimental results of shot boundary detection on the sequenceSceneswap is shown in FIG. 9. This sequence consists of eleven shots,separated by shot boundaries after frame 72, 164, 223, 349, 423, 465,519, 583, 619 and 681 (found manually). The threshold at 0.75 is shownas a solid line. Results are given at frame rates 2.5, 6.25 and 3.125Hz. At all frame rates, the ten boundaries are found successfully andcan be seen as ten groups of three markers below the detection thresholdat the above mentioned frame numbers. At 2.5 and 6.25 Hz the detectionis stable, with a correlation above 0.95 and 0.9, respectively, for allnon-boundaries. This can be seen as a pattern at the top of the figure.At 3.125 Hz however, the frame rate has dropped two and five falseresponses occur, all marked with an arrowhead. Thus the importance of ahigh frame rate is illustrated.

[0062] A typical preprocessor result is shown in FIG. 10 for the 25 Hzsequence TriScene, with two correctly detected shot boundaries. Theframes surviving the decimation are marked by triangles. Sharpness is onthe vertical axis. Observe that local sharpness minima are avoided.

[0063] In FIG. 11, it is illustrated how the preprocessor manages tomake the system independent of the input frame rate, provided that thisis sufficiently high. The result is for the 12.5 Hz sequence Stove, witha total of 107 frames. The sequence is handheld, with the camera movingin an arc in front of a kitchen stove. The sequence was subsampled to 50lower frame rates and fed into the preprocessor. With very few inputframes (<12), shot boundaries are falsely detected. With the number ofinput frames higher than 30 however, this is no longer a problem and thenumber of output frames remains fairly constant at about 20. When fedwith the full frame rate, the preprocessor removes about 80% of theframes and the SaM algorithm can then carry on to produce thereconstruction shown in FIG. 12. The beginning of the camera arc can beseen at the front of the figure. The camera then travels out of view andreturns at the back of the figure.

[0064] In FIG. 13, the reconstruction from the sequence Room is shown.This is a handheld sequence, where the camera moves forward through anoffice. Many frames are out of focus. At the beginning and the end ofthe sequence, the camera moves very little and only rotates slightlyback and forth, which is not uncommon in raw video material. Thepreprocessor successfully removes most of these frames, which enables areasonable trajectory of camera views to be extracted, although thestructure for this sequence is still rather poor.

[0065] In FIG. 14 and FIG. 15, a side and a top view, respectively, areshown for the sequence David. This is a head and shoulders sequence,with a picture hanging on a wall behind. The person in the sequenceacquired it by holding the camera with a stretched arm and performing anarched motion. Again, the motion of the center of the projection isnegligible between a couple of frames near the end of the arc. With theSaM system used in the experiments, this tends to ruin all the points ofthe face, causing a decapitation. With preprocessing, the troublesomeframes are removed and the problem is eliminated.

[0066] In summary, exemplary embodiments of the invention describedabove confer multiple benefits. For example, using the preprocessor therelatively expensive SaM processing can be performed on a smaller numberof views. Another benefit is that some tedious manual tasks, such assegmentation of the raw video material into shots and the choice offrame rate for every shot, are automated through the use of thepreprocessing mechanism. Furthermore, degeneracies due to insufficientmotion can sometimes be avoided. Automatic processing can adapt to themotion and avoid any undue assumptions about the input frame rate.Furthermore, unsharp frames caused by bad focus, motion blur, etc. or aseries of frames with low interdisparity can be discarded at an earlystage.

[0067] The invention has been described with reference to particularembodiments. However, it will be readily apparent to those skilled inthe art that it is possible to embody the invention in specific formsother than those of the preferred embodiments described above. This maybe done without departing from the spirit of the invention.

[0068] For example, the preprocessor 100 has been illustrated as aseparate processing mechanism to that of the video capture and the SaMprocessing. However, this is not essential. Rather, the inventivetechniques can be adapted for use within the video camera or within theSaM processing mechanism.

[0069] Thus, the preferred embodiment is merely illustrative and shouldnot be considered restrictive in any way. The scope of the invention isgiven by the appended claims, rather than the preceding description, andall variations and equivalents which fall within the range of the claimsare intended to be embraced therein.

What is claimed is:
 1. A method for preprocessing a video sequence, themethod comprising the steps of: receiving the video sequence; andgenerating a set of views suitable for algorithmic processing byperforming frame decimation on the video sequence.
 2. The methodaccording to claim 1 , wherein said frame decimation comprises the stepsof: identifying redundant frames in the video sequence; and deleting anyframes which are identified as redundant.
 3. The method according toclaim 2 , wherein the step of identifying redundant frames in the videosequence comprises the steps of: calculating whether a frame isessential for connectivity of the video sequence; and identifying theframe as redundant when the frame is determined not to be essential forconnectivity of the video sequence.
 4. The method according to claim 2 ,wherein said step of identifying redundant frames comprises the stepsof: determining a motion estimation between the frames in the videosequence; and identifying a frame as redundant if the motion estimationyields a final correlation coefficient above a predetermined threshold.5. The method according to claim 4 , wherein the motion estimation is aglobal motion estimation.
 6. The method according to claim 4 , whereinthe motion estimation is a local motion estimation.
 7. The methodaccording to claim 1 further comprising the steps of: determining shotboundaries of the video sequence; dividing the video sequence into atleast one subsequence of frames, wherein each of the at least onesubsequence of frames corresponds to a particular shot in the videosequence; identifying redundant frames in the at least one subsequenceof frames; and deleting from the at least one subsequence of frames anyframes which are identified as redundant.
 8. The method according toclaim 7 , wherein the shot boundaries are provided by the camera whichcaptured the video sequence.
 9. The method according to claim 7 ,wherein the step of determining the shot boundaries comprises the stepsof: correlating adjacent frames in the video sequence after globalmotion compensation; and identifying, for each pair of adjacent frames,the second frame in the pair as a beginning of a new shot based on thecorrelation between the frames in the pair.
 10. The method according toclaim 1 , wherein the video sequence is received from a video capturedevice in real-time.
 11. A method for capturing a video sequence, themethod comprising the steps of: receiving video from a video capturedevice as a sequence of frames; for each frame in the sequence,determining whether or not to accept the frame; and storing the acceptedframes in a storage device.
 12. The method according to claim 11 ,wherein the step of determining whether or not to accept a frame fromthe sequence of frames further comprises the steps of: determiningwhether or not the frame is redundant; and accepting the frame if it isdetermined not to be redundant.
 13. The method according to claim 12 ,wherein the step of determining whether or not the frame is redundantcomprises the steps of: calculating a motion estimation between theframe and a previously accepted frame; and identifying the frame asredundant if the motion estimation yields a final correlationcoefficient above a predetermined threshold.
 14. The method according toclaim 12 , wherein the step of determining whether or not the selectframe is redundant comprises the steps of: calculating a motionestimation between the frame and all the previously accepted frames; andidentifying the frame as redundant if the motion estimation yields afinal correlation coefficient above a predetermined threshold.
 15. Themethod according to claim 11 , further comprising the steps of:monitoring the rate at which accepted frames are provided to the storagedevice; and providing an indication to the user of the video capturedevice to decrease the motion of the camera, if the storage device isunable to process the accepted frames at the current rate.
 16. A methodfor processing a video sequence to produce a set of views suitable forStructure from Motion processing, the method comprising the steps of:receiving a frame; comparing the frame with at least one previouslyreceived frame; and storing the received frame in a storage device whenthe comparison indicates that a difference between the frame and the atleast one previously received frame is greater than a predeterminedamount.
 17. The method according to claim 16 , wherein the differenceindicates a motion between the at least one previously received frameand the received frame.
 18. The method according to claim 16 , whereinthe frame is received from a video capture device in real-time.
 19. Themethod according to claim 16 , wherein the frame is received from astorage medium.
 20. A system for preprocessing a video sequence toproduce a set of views suitable for Structure from Motion processing,said system comprising: a video sequence source; a storage medium; and apreprocessor, wherein the preprocessor is configured to perform framedecimation.
 21. The system of claim 20 , wherein the preprocessorcomprises a data buffer for receiving the video sequences.
 22. Thesystem of claim 21 , wherein the video sequence source is a videocapture device.
 23. The system of claim 21 , wherein the video sequencesource is a memory device.
 24. The system of claim 20 , wherein thestorage device is a flash memory device.
 25. The system of claim 20 ,wherein the frame decimation further comprises the steps of: identifyingredundant frames in the video sequence; and deleting from the videosequence any frames which are identified as redundant.
 26. The system ofclaim 20 , wherein the frame decimation comprises the step of: receivinga frame; comparing the frame with at least one previously receivedframe; and storing the received frame in a storage device when thecomparison indicates that a difference between the frame and the atleast one previously received frame is greater than a predeterminedamount.
 27. The system of claim 26 , wherein the frame is received froma video capture device in real-time.
 28. The system of claim 22 ,wherein the video sequence is received as a sequence of frames from avideo capture device in real-time.